Data Deduplication With Random Substitutions
نویسندگان
چکیده
Data deduplication saves storage space by identifying and removing repeats in the data stream. Compared with traditional compression methods, schemes are more time efficient thus widely used large scale systems. In this paper, we provide an information-theoretic analysis on performance of algorithms streams which not exact. We introduce a source model probabilistic substitutions considered. More precisely, each symbol repeated string is substituted given edit probability. Deduplication both fixed-length scheme variable-length studied. The algorithm shown to be unsuitable for proposed as it does take into account Two modifications have performances within constant factor optimal knowledge parameters. also study conventional show that entropy becomes smaller, size compressed vanishes relative length uncompressed string, leading high ratios.
منابع مشابه
Distributed Data Deduplication
Data deduplication refers to the process of identifying tuples in a relation that refer to the same real world entity. The complexity of the problem is inherently quadratic with respect to the number of tuples, since a similarity value must be computed for every pair of tuples. To avoid comparing tuple pairs that are obviously non-duplicates, blocking techniques are used to divide the tuples in...
متن کاملPerfectDedup: Secure Data Deduplication
With the continuous increase of cloud storage adopters, data deduplication has become a necessity for cloud providers. By storing a unique copy of duplicate data, cloud providers greatly reduce their storage and data transfer costs. Unfortunately, deduplication introduces a number of new security challenges. We propose PerfectDedup, a novel scheme for secure data deduplication, which takes into...
متن کاملSimilarity Based Deduplication with Small Data Chunks
Large backup and restore systems may have a petabyte or more data in their repository. Such systems are often compressed by means of deduplication techniques, that partition the input text into chunks and store recurring chunks only once. One of the approaches is to use hashing methods to store fingerprints for each data chunk, detecting identical chunks with very low probability for collisions...
متن کاملDeduplication in Hybrid Cloud with Secure Data
Deduplication is also called single instance technique, deduplication remove redundant data and stores original copy of data so it will saves the storage space to protect sensitive data. The data security and access to particular data is very much important in current days hence the features in deduplication have been widely used in cloud storage system. There was drawback in previous work wher...
متن کاملCloud Based Data Deduplication with Secure Reliability
IJRAET Abstract— To eliminate duplicate copies of data we use data de-duplication process. As well as it is used in cloud storage to minimize memory space and upload bandwidth only one copy for every file stored in cloud that can be used by more number of users. Deduplication process helps to improve storage space. Another challenge of privacy for sensitive data also arises. The aim of this pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Information Theory
سال: 2022
ISSN: ['0018-9448', '1557-9654']
DOI: https://doi.org/10.1109/tit.2022.3176778